-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of multidoc insertion errors for vector store #110
Conversation
Huh? The integration tests are able to run again? Well that's definitely good news (though I wonder what was the cause for the secrets not passing through in the first place). |
Added a cap (currently 8) on the amount of errors ending up in the message string.
|
Added a note in the error message on the original astrapy exception being accessible by chaining as |
f"following error(s): {all_err_descs}" | ||
f"{there_s_more}{original_err_note}" | ||
) | ||
raise ValueError(full_err_message) from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError
should normally be for errors on the input args.
Maybe there could be a AstraDBVectorStoreError
?
(there are a few other places that could be replaced)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I would consider not creating custom stuff unless really necessary, but yours might be a good suggestion. Would you rather:
- raise a
RuntimeError
(more appropriate than ValueError, I agree) - create an
AstraDBVectorStoreError
, subclass ofRuntimeError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think good practice would be to specialize the exception, so use AstraDBVectorStoreError
.
AstraDBVectorStoreError
can be subclass of Exception
, not necessarily RuntimeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented AstraDBVectorStoreError
and now it is raised in methods of AstraDBVectorStore that are not happening as part of the init (in a broad sense, e.g. autodetection errors are still "during init" => ValueError)
for edesc in filtered_error_descs[:MAX_SHOWN_INSERTION_ERRORS] | ||
) | ||
there_s_more: str | ||
if len(filtered_error_descs) > MAX_SHOWN_INSERTION_ERRORS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
if num_residual := len(filtered_error_descs) - MAX_SHOWN_INSERTION_ERRORS:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To walrus up the assignment correctly, it must actually be
if num_residual := max(0, len(filtered_error_descs) - MAX_SHOWN_INSERTION_ERRORS):
Slightly less readable but still OK i guess. (otherwise you get funny "Note: -7 errors omitted" or similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right.
I think then keep it as-is, without max
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no sorry again !
It could be
if num_residual := len(filtered_error_descs) - MAX_SHOWN_INSERTION_ERRORS > 0:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahah, you are right! Silly me for not having thought of that myself. One moment ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um... wrong again? For the variable would be a boolean, but what is needed for the message is a number.
I think I will leave it as is, with apologies for the missed walrus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly me
if (num_residual := len(filtered_error_descs) - MAX_SHOWN_INSERTION_ERRORS) > 0:
This time it should work, I tested 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eheh, I pushed this one and I agree this is our man! Tests running again
This PR alleviates a problem with the kind of error that is raised under different failure scenarios during AstraDBVectorStore insertions.
(Do not mind the CI failing. There is some bug with the CI actions related to secrets not trickling down to the test workflows. I ran the tests locally and everything related went smooth.)<== Nevermind, those CI problems somehow have vanished (by themselves?).Current problems:
the insertion is attempted assuming new document _ids. If it fails, the (detailed) errors from astrapy are inspected.
(note: "errors", plural, since a single insertion can result in a number of errors from several documents).
Now, this exception right now is the one received from Astrapy, a fact which has two flaws:
These 2 issues can combine leading to a scenario where all the user sees is a "doc already exists", adding to the confusion.
Also it can be argued that it is leaky, in this case, to expose an astrapy exception as is to the user.
What this PR does
The change is for case 2 above.
ValueErrornewly-introducedlangchain_astradb.AstraDBVectorStoreError
is raised (and not aastrapy.exception.InsertManyException
anymore - slightly breaking for try-catch code)__cause__
. A note in the very error messages reminds the user about this (see examples below).Below is a Python script exemplifying various insertion scenarios and the before- and after- kind of exception a user would get.
Demo script (before-and-after)